macpie

Introduction

MAC-seq is a cost-effective, high-throughput transcriptomic platform, developed as a collaboration betwee VCFG and MGC core facilities at PeterMac Cancer Centre, primarily designed for small molecule screening. However, its versatility extends beyond this application, thanks to its integration with high-throughput microscopy and 3D cell culturing techniques.

macpie is a toolkit specifically developed for researchers working with MAC-seq data. Its primary aim is to deliver the latest tools for quality control (QC), visualization, and analysis. Macpie is a result of a collaborative effort by a workgroup at the Peter MacCallum Cancer Centre (PeterMac), with a substantial support of the VCFG amd MGC core facilities.

MAC-seq data shares similarities with single-cell RNA-seq data as it derives gene counts from RNA sequences measured at the 3’ ends of polyA transcripts. This results in sparse data, with many transcripts undetectable in individual samples, and a high throughput of transcriptomes analyzed per experiment. However, MAC-seq also parallels standard RNA-seq, as it typically includes replicate conditions and involves hundreds rather than thousands of samples per experiment, measured at higher depth.

macpie was designed to offer users a collection of best-practice analytical methods, drawing from both scRNA-seq and RNA-seq methodologies, while ensuring ease of use for non-experts.

1. Metadata import and QC

To ensure the integrity of metadata for future analyses, we provide the user with a set of tools to verify metadata consistency and visualize the key variables. Metadata has to be in a tabular format and contain the key columns for sample description as in the example dataset below.

Key points:


# Clean environment
rm(list = ls(all.names = TRUE)) # will clear all objects including hidden objects
gc() # free up memory and report the memory usage
<<<<<<< HEAD
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  603359 32.3    1369002 73.2   709760 38.0
#> Vcells 1126825  8.6    8388608 64.0  1875678 14.4
=======
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  614701 32.9    1395385 74.6         NA   721805 38.6
#> Vcells 1155586  8.9    8388608 64.0     102400  2031695 15.6
>>>>>>> 2f6d7a6 (Fix vignette)

# Load all functions
devtools::load_all()

library(macpie)
library(Seurat)
library(edgeR)
library(dplyr)
library(leidenbase)
library(gridExtra)
library(enrichR)
library(parallel)
library(mcprogress)
library(ggsci)
library(ggiraph)
library(pheatmap)
library(tidyr)
library(tibble)
library(enrichR)
library(variancePartition)
library(glmGamPoi)
library(PoiClaClu)
library(Matrix)


# Define project variables
project_name <- "PMMSq033"
project_metadata <- system.file("extdata/PMMSq033/PMMSq033_metadata_drugnames.csv", package = "macpie")

# Load metadata
metadata <- read_metadata(project_metadata)
metadata$Time <- as.factor(metadata$Time)
metadata$Concentration_1 <- as.factor(metadata$Concentration_1)
colnames(metadata)
#>  [1] "Plate_ID"        "Well_ID"         "Row"             "Column"         
#>  [5] "Species"         "Cell_type"       "Model_type"      "Time"           
#>  [9] "Unit"            "Treatment_1"     "Concentration_1" "Unit_1"         
#> [13] "Sample_type"     "Barcode"         "Project"

# Validate metadata
validate_metadata(metadata)
#> 
#> Validation Issues:
#> Model_type: Contains special characters.
#> 
#> Generating summary table grouped by Plate_ID...
#>   Plate_ID count_Species count_Cell_type count_Model_type count_Time count_Unit
#> 1 PMMSq033             1               3                3          1          1
#>   count_Treatment_1 count_Concentration_1 count_Unit_1 count_Sample_type
#> 1                36                     4            1                 3

First, let’s visually inspect the large number of experimental variables, in order to correct artefacts and other metadata errors.


plot_metadata_heatmap(metadata)

2. Sequencing data import and QC

Key points:

2.1 Import data to tidySeurat object

Data is imported into a tidySeurat object, which allows the usage of both the regular Seurat functions, as well as the functionality of tidyverse.


project_rawdata <- system.file("extdata/PMMSq033/raw_matrix", package = "macpie")
raw_counts_total <- Read10X(data.dir = project_rawdata)
keep <- rowSums(cpm(raw_counts_total) >= 10) >= 2
raw_counts <- raw_counts_total[keep, ]

# Create tidySeurat object
mac <- CreateSeuratObject(counts = raw_counts,
                          project = project_name,
                          min.cells = 1,
                          min.features = 1)
# Calculate percent of mitochondrial and ribosomal genes
mac[["percent.mt"]] <- PercentageFeatureSet(mac, pattern = "^mt-|^MT-")
mac[["percent.ribo"]] <- PercentageFeatureSet(mac, pattern = "^Rp[slp][[:digit:]]|^Rpsa|^RP[SLP][[:digit:]]|^RPSA")

# Join with metadata
mac <- mac %>%
  inner_join(metadata, by = c(".cell" = "Barcode"))

# Add unique identifier
mac <- mac %>%
  mutate(combined_id = str_c(Treatment_1, Concentration_1, sep = "_")) %>%
  mutate(combined_id = gsub(" ", "", .data$combined_id))

# Example of a function from Seurat QC 
VlnPlot(mac, features = c("nFeature_RNA", "nCount_RNA", "percent.mt", "percent.ribo"), 
        ncol = 4, group.by = "Sample_type") & 
  scale_fill_manual(values = macpie_colours$discrete) 

This allows us to apply all of the tidyverse functions to manipulate the Seurat object. For example, let’s subset the Seurat object based on the column “Project” in metadata and visualise the grouping of data on the plate vs on an MDS plot. Plate layout plots are useful for visualising any spatial anomalies or unexpected patterns.

unique(mac$Project)
#> [1] "Trial"   "Current"
mac <- mac %>%
  filter(Project == "Current")

# QC plot plate layout (all metadata columns can be used):
p <- plot_plate_layout(mac, "nCount_RNA", "combined_id")
girafe(ggobj = p, 
  fonts = list(sans = "sans"),
  options = list(
    opts_hover(css = "stroke:black; stroke-width:0.8px;")  # <- slight darkening
  ))
<<<<<<< HEAD ======= >>>>>>> 2f6d7a6 (Fix vignette)

2.2 Basic QC metrics

2.2.1 Sample grouping with MDS plot

As a first step, we should visualise grouping of the samples based on top 500 expressed genes and limma’s MDS function. As a warning, samples that are treated with a lower concentration of compound will often cluster close to the negative (vehicle) control.

p <- plot_mds(mac, group_by = "Sample_type", label = "combined_id", n_labels = 30)
girafe(ggobj = p, fonts = list(sans = "sans"))
<<<<<<< HEAD ======= >>>>>>> 2f6d7a6 (Fix vignette)

Since we are operating from a standard Seurat object, we can also use a regular scRNA-seq workflow too.

mac <- SCTransform(mac) %>%
  RunPCA() %>%
  RunUMAP(dims = 1:30)
DimPlot(mac, reduction = "umap", group.by = "Sample_type", cols = macpie_colours$discrete)
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)
2.2.2 Distribution of read counts

We also want to expect what is the distribution of reads across the experiment. To that end we use box plot to show distribution of read counts grouped across treatments. Read count is commonly directly proportional to the number of cells.

qc_stats <- compute_qc_metrics(mac, group_by = "combined_id", order_by = "median")
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)
qc_stats$stats_summary
#> # A tibble: 83 × 6
#>    combined_id       sd_value mad_value group_median z_score    IQR
#>    <chr>                <dbl>     <dbl>        <dbl>   <dbl>  <dbl>
#>  1 5-azacytidine_0.1    2596.     3044.        51399  -0.318  2578.
#>  2 5-azacytidine_1      3740.     3222.        55824   0.143  3642.
#>  3 5-azacytidine_10     4003.     1918.        45863  -0.896  3744.
#>  4 Abemaciclib_0.1      5614.     5309.        61901   0.777  5503 
#>  5 Abemaciclib_1        3824.     4596.        52934  -0.158  3802.
#>  6 Abemaciclib_10      15634.    10562.        49929  -0.472 14964 
#>  7 Adavosertib_0.1      6519.     6166.        54307  -0.015  6390 
#>  8 Adavosertib_1        8640.     7828.        53658  -0.083  8444.
#>  9 Adavosertib_10       9935.    11819.        43491  -1.14   9874.
#> 10 Anastrozole_0.1      1691.     2114.        58986   0.473  1684.
#> # ℹ 73 more rows
2.2.3 Variability among all replicates

In relation to the previous plot, we want the user to have the ability to assess the dispersion of reads within a sample. To that end, we have allowed the user to have access to statistical metrics such as standard deviation (sd_value), z score (z_score), mad (mad_value) and IQR (IQR) which can be used as a parameter to the function plot_qc_metrics individually, or assessed at once with the function plot_qc_metrics_heatmap.

As you can see below, Staurosporine had the largest variability between the samples across all metrics.


plot_qc_metrics_heatmap(qc_stats$stats_summary)
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)
2.2.4 Variability within a sample

Due to the lower read counts per sample, MAC-seq is more variable than RNA-seq. It is therefore fairly important to estimate bioogical variability between the replicates. We provide a way to estimate inter-replicate variability using poisson distance within the function plot_distance.

plot_distance(mac, "combined_id", treatment = "DMSO_0")
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)
2.2.5 Estimate of the batch effect

Several methods are available for scaling and normalizing transcriptomic data, with their effects most clearly visualized using RLE (Relative Log Expression) plots. In our case, limma_voom provides the lowest average coefficient of variation, when compared to other methods such as “raw”, Seurat “SCT” or “edgeR”.

# First we will subset the data to look at control, DMSO samples only
mac_dmso <- mac %>%
  filter(Treatment_1 == "DMSO")

# Run the RLE function
plot_rle(mac_dmso, label_column = "Row", normalisation = "limma_voom")
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)

3. Differential gene expression

Key points:

3.1. Single comparison

Similar to scRNA-seq data, MAC-seq gene expression counts have an excess of zero counts compared to bulk RNA-seq. Statistical models assuming a Poisson or negative binomial distribution may not fit the data well. For that reason, differential expression effects could be over or underestimated.

Let’s first perform the differential expression analysis with a couple of methods and visualise the results on a volcano plot.

# First perform the differential expression analysis
treatment_samples <- "Staurosporine_0.1"
control_samples <- "DMSO_0"
top_table <- compute_single_de(mac, treatment_samples, control_samples, method = "limma_voom")

# Let's visualise the results with a volcano plot
plot_volcano(top_table)
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)

Based on the results, we can quickly check gene expression levels in counts per million (CPM) for selected genes between treatment and control samples as described below.

genes <- top_table$gene[1:6]
group_by <- "combined_id"
plot_cpm(mac,genes, group_by, treatment_samples, control_samples)
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)

Some plotting functions also have a “summarise” version that provides collapsed versions of the results in a table format.

summarise_de(top_table, lfc_threshold = 1, padj_threshold = 0.05)
#> # A tibble: 1 × 6
#>   Total_genes_tested Significantly_upregulated Significantly_downregulated
#>                <int>                     <int>                       <int>
#> 1              25519                       526                         212
#> # ℹ 3 more variables: Total_significant <int>, Padj_threshold <dbl>,
#> #   Log2FC_threshold <dbl>

3.2. Pathway analysis, single comparison

Differential gene expression results for individual comparisons of treatment vs control can be easily performed with functions from package enrichR and fgsea. In the following case, the effect of Staurosporine on breast cancer cells through Myc inactivation can be observed through pathway enrichment analyses. If you check the data from “DisGeNET”, you will see that our mcf7 (breast cancer cell line) samples are correctly enriched for breast cancer profiles.


top_genes <- top_table %>%
  filter(p_value_adj < 0.05) %>%
  select(gene) %>%
  pull()

enriched <- enrichR::enrichr(top_genes, c("MSigDB_Hallmark_2020","DisGeNET",
                                 "RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO"))
#> Uploading data to Enrichr... Done.
#>   Querying MSigDB_Hallmark_2020... Done.
#>   Querying DisGeNET... Done.
#>   Querying RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO... Done.
#> Parsing results... Done.
p1 <- enrichR::plotEnrich(enriched[[1]]) + 
  macpie_theme(legend_position_ = 'right') + 
  scale_fill_gradientn(colors = macpie_colours$continuous)

gridExtra::grid.arrange(p1, ncol = 1)
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)

3.3. Differential gene expression - multiple comparisons

Since MAC-seq is commonly used for high-throughput screening of compound libraries, we often want to compare multiple samples in a screen vs the control. This process can easily be parallelised. First we select a vector of “treatments” as combined_ids that do not contain the word “DMSO”. (Warning, parallelisation speedup currently only works on OSX and Linux machines, and not on Windows)

mac$combined_id <- make.names(mac$combined_id)

treatments <- mac %>%
  filter(Concentration_1 == 10) %>%
  select(combined_id) %>%
  filter(!grepl("DMSO", combined_id)) %>%
  pull() %>%
  unique()
mac <- compute_multi_de(mac, treatments, control_samples = "DMSO_0", method = "limma_voom", num_cores = 2)

We often want to ask what are the genes are differentially expressed in more than one treatment group.

Here, we can visualise treatment groups with shared differentially expressed genes. The definition of shared diffrentially expressed genes are the top 5 DE genes from each single drug comparison (treatment vs control) that are found in at least 2 different treatment group.

This heatmap shows shared differentially expressed genes with corresponding log2FC values.

plot_multi_de(mac, group_by = "combined_id", value = "log2FC", p_value_cutoff = 0.01, direction="up", n_genes = 5, control = "DMSO_0", by="fc")
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette) If you prefer to see at the expression level for each replicate, you can do it by specifying logCPM - “lcpm”. As we’re checking log CPM, we can also check our control DMSO_0.

plot_multi_de(mac, group_by = "combined_id", value = "lcpm", p_value_cutoff = 0.01, direction="up", n_genes = 5, control = "DMSO_0", by="fc")
<<<<<<< HEAD

=======

>>>>>>> 2f6d7a6 (Fix vignette)

Summarise the outputs from multi de comparison in a table

summarise_de(mac, lfc_threshold = 1, padj_threshold = 0.01, multi=TRUE)
#> # A tibble: 27 × 7
#>    combined_id  Total_genes_tested Significantly_upregu…¹ Significantly_downre…²
#>    <chr>                     <int>                  <int>                  <int>
#>  1 Abemaciclib…              25519                     28                     47
#>  2 Adavosertib…              25519                    294                     99
#>  3 Anastrozole…              25519                      0                      1
#>  4 Azd.5991_10               25519                      0                      0
#>  5 Camptotheci…              25519                   1708                   1808
#>  6 Capivaserti…              25519                     22                     19
#>  7 Ceralaserti…              25519                    183                     89
#>  8 Chlorambuci…              25519                      0                      1
#>  9 Cytarabine_…              25519                      8                      5
#> 10 Erlotinib_h…              25519                    232                    141
#> # ℹ 17 more rows
#> # ℹ abbreviated names: ¹​Significantly_upregulated, ²​Significantly_downregulated
#> # ℹ 3 more variables: Total_significant <int>, padj_threshold <dbl>,
#> #   Log2FC_threshold <dbl>

Pathway enrichment analysis is done by using enrichR.

Summarise the outputs from multi de comparison in a table


# Load genesets from enrichr for a specific species or define your own
enrichr_genesets <- download_geneset("human", "MSigDB_Hallmark_2020")
mac <- compute_multi_enrichr(mac, genesets = enrichr_genesets)

enriched_pathways_mat <- mac@tools$pathway_enrichment %>%
  bind_rows() %>%
  select(combined_id, Term, Combined.Score) %>%
  pivot_wider(names_from = combined_id, values_from = Combined.Score) %>%
  column_to_rownames(var = "Term") %>%
  mutate(across(everything(), ~ ifelse(is.na(.), 0, log1p(.)))) %>%  # Replace NA with 0 across all columns
  as.matrix()

<<<<<<< HEAD

pheatmap(enriched_pathways_mat, color = macpie_colours$continuous_rev)

Quick check of some treatments:

Nutlin.3a is a MDM2-P53 inhibitor and stablises the p53 protein. It induces cell autophagy and apotopsis. Nutlin-activated p53 induces G1 and G2 arrest in cancer cell lines (see in the pathway enrichment heatmap).

Ref: Tovar C, et al. Proc Natl Acad Sci USA. 2006;103(6):1888–1893. Shows Nutlin-3’s effect on various p53 targets in cancer cell lines.

======= pheatmap(enriched_pathways_mat, color = macpie_colours$continuous_rev) + macpie_theme()

#> NULL
>>>>>>> 2f6d7a6 (Fix vignette)

4. Methods for compound screening

Key points:

4.1. Profile Similarity

<<<<<<< HEAD

mac <- compute_multi_screen_profile(mac, target = "Staurosporine_10", num_cores = 1)
mac_screen_profile <- mac@tools$screen_profile %>%
  mutate(logPadj = c(-log10(padj))) %>%
  arrange(desc(NES)) %>%
  mutate(target = factor(target, levels = unique(target))) 

ggplot(mac_screen_profile, aes(target, NES)) +
  #geom_point(aes(size = logPadj)) +
  geom_point() +
  facet_wrap(~pathway, scales = "free") +
  macpie_theme(x_labels_angle = 90, show_x_title = F)

=======

mac <- compute_multi_screen_profile(mac, target = "Staurosporine_10", num_cores = 1)
mac_screen_profile <- mac@tools$screen_profile %>%
  mutate(logPadj = c(-log10(padj))) %>%
  arrange(desc(NES)) %>%
  mutate(target = factor(target, levels = unique(target))) 

ggplot(mac_screen_profile, aes(target, NES)) +
  #geom_point(aes(size = logPadj)) +
  geom_point() +
  facet_wrap(~pathway, scales = "free") +
  macpie_theme(x_labels_angle = 45, show_x_title = F)

>>>>>>> 2f6d7a6 (Fix vignette)

5. Multi-plates analysis

This section walks through QC and differential expression analysis of two MAC-seq plates with matched layouts.

Key points:

5.1 Import data and QC

Here we import the raw count matrix and metadata for the first plate, apply QC filters, and generate the tidySeurat object.

Load PMMSq033 and PMMSq034

project_metadata <- system.file("extdata/PMMSq033/PMMSq033_metadata_drugnames.csv", package = "macpie")
metadata <- read_metadata(project_metadata)

pmm33_in <- system.file("extdata/PMMSq033/raw_matrix", package = "macpie")
pmm34_in <- system.file("extdata/PMMSq034/raw_matrix", package = "macpie")
raw_counts_total <- Read10X(data.dir = c(pmm33_in,pmm34_in))
keep <- rowSums(cpm(raw_counts_total) >= 10) >= 2
raw_counts <- raw_counts_total[keep, ]
combined <- CreateSeuratObject(counts=raw_counts,
                               min.cells = 1,
                          min.features = 1)

combined[["percent.mt"]] <- PercentageFeatureSet(combined, pattern = "^mt-|^MT-")
combined[["percent.ribo"]] <- PercentageFeatureSet(combined, pattern = "^Rp[slp][[:digit:]]|^Rpsa|^RP[SLP][[:digit:]]|^RPSA")

#join with metadata
combined$Barcode <- str_replace_all(rownames(combined@meta.data),"[1|2]_","")

combined <- combined %>%
  inner_join(metadata, by = c("Barcode" = "Barcode"))

combined <- combined%>%
  filter(Project == "Current")

#add unique identifier
combined <- combined %>%
  mutate(combined_id = str_c(Treatment_1, Concentration_1, sep = "_")) %>%
  mutate(combined_id = gsub(" ", "", .data$combined_id))

combined$combined_id <- make.names(combined$combined_id)
# meta <- meta %>% mutate(Treatment_1 =  ifelse(grepl("^Ada|^Shenali|^EMPTY$", Treatment_1), Treatment_1, paste0("ML_", Treatment_1)))

combined <- combined %>% mutate(orig.ident = ifelse(grepl("1", orig.ident), "PMMSq033","PMMSq034"))

Plate layout for both plates:

p <- plot_plate_layout(combined, "nCount_RNA", "combined_id") + facet_wrap(~orig.ident)
girafe(ggobj = p, 
  fonts = list(sans = "sans"),
  options = list(
    opts_hover(css = "stroke:black; stroke-width:1px;")  # <- slight darkening
  ))

We visualize sample grouping across plates using MDS (multidimensional scaling) plots and assess expression variability using RLE plots.

combined_dmso <- combined %>%
  filter(Treatment_1 == "DMSO")
plot_mds(combined, group_by = "orig.ident", label = "combined_id", n_labels = 30)
<<<<<<< HEAD

plot_mds(combined_dmso, group_by = "orig.ident", label = "combined_id", n_labels = 30)

RLE plots to show before and after using limma_voom for batch correction.

plot_rle(combined_dmso, label_column = "orig.ident", normalisation = "raw") + scale_x_discrete(drop = FALSE) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot_rle(combined_dmso, label_column = "orig.ident", normalisation = "limma_voom") + scale_x_discrete(drop = FALSE) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

=======

plot_mds(combined_dmso, group_by = "orig.ident", label = "combined_id", n_labels = 30)

plot_rle(combined_dmso, label_column = "orig.ident", normalisation = "raw") + scale_x_discrete(drop = FALSE) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot_rle(combined_dmso, label_column = "orig.ident", normalisation = "edgeR") + scale_x_discrete(drop = FALSE) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

>>>>>>> 2f6d7a6 (Fix vignette)

5.2 Single comparison DE analysis

We run differential expression using edgeR between one treatment condition and control (e.g. Staurosporine_0.1 vs DMSO_0), while accounting for plate effects via batch modeling.

First, we look at the volcano plot if no batch correction is used.

treatment_samples <- "Staurosporine_0.1"
control_samples <- "DMSO_0"
<<<<<<< HEAD
combined_lv_nobatch <- compute_single_de(combined, treatment_samples, control_samples, method =  "limma_voom")
plot_volcano(combined_lv_nobatch, max.overlaps = 10)

Then, we compare with the volcano plot after batch correction.

treatment_samples <- "Staurosporine_0.1"
control_samples <- "DMSO_0"
subset <- combined[, grepl(paste0(treatment_samples, "|", control_samples), combined$combined_id)]
batch <- subset$orig.ident
combined_lv <- compute_single_de(combined, treatment_samples, control_samples, method = "limma_voom", batch = batch)
plot_volcano(combined_lv)

Check CPM levels for top 6 differentially expressed genes.

genes <- combined_lv$gene[1:6]
group_by <- "combined_id"
plot_cpm(combined,genes, group_by, treatment_samples, control_samples)

Summarise the outputs from single comparison in a table

summarise_de(combined_lv, lfc_threshold = 1, padj_threshold = 0.01, multi=FALSE)
#> # A tibble: 1 × 6
#>   Total_genes_tested Significantly_upregulated Significantly_downregulated
#>                <int>                     <int>                       <int>
#> 1              25736                     13168                         222
#> # ℹ 3 more variables: Total_significant <int>, Padj_threshold <dbl>,
#> #   Log2FC_threshold <dbl>
======= subset <- combined[, grepl(paste0(treatment_samples, "|", control_samples), combined$combined_id)] batch <- subset$orig.ident combined_edgeR <- compute_single_de(combined, treatment_samples, control_samples, method = "edgeR", batch = batch) plot_volcano(combined_edgeR)

Check CPM levels for top 6 differentially expressed genes.

genes <- combined_edgeR$gene[1:6]
group_by <- "combined_id"
plot_cpm(combined,genes, group_by, treatment_samples, control_samples)

Summarise the outputs from single comparison in a table

summarise_de(combined_edgeR, lfc_threshold = 1, padj_threshold = 0.01, multi=FALSE)
#> # A tibble: 1 × 6
#>   Total_genes_tested Significantly_upregulated Significantly_downregulated
#>                <int>                     <int>                       <int>
#> 1              25598                       151                          49
#> # ℹ 3 more variables: Total_significant <int>, Padj_threshold <dbl>,
#> #   Log2FC_threshold <dbl>
>>>>>>> 2f6d7a6 (Fix vignette)

5.3 Multi-comparison DE analysis

We compare all treatment conditions against the DMSO control. This is especially useful for screening multiple drugs across the plates.

For visualisation purpose, we only use high concentration treatments in this workshop.

treatments <- combined %>%
  filter(Concentration_1 == 10) %>%
  select(combined_id) %>%
  filter(!grepl("DMSO", combined_id)) %>%
  pull() %>%
  unique()
combined <- compute_multi_de(combined, treatments, control_samples = "DMSO_0", method = "edgeR", num_cores = 2, batch = batch)

Summarise the outputs from single comparison in a table

<<<<<<< HEAD
summarise_de(combined, lfc_threshold = 1, padj_threshold = 0.01, multi=TRUE)
#> # A tibble: 27 × 7
#>    combined_id  Total_genes_tested Significantly_upregu…¹ Significantly_downre…²
#>    <chr>                     <int>                  <int>                  <int>
#>  1 Abemaciclib…              25736                    127                     24
#>  2 Adavosertib…              25736                    125                     23
#>  3 Anastrozole…              25736                      0                      0
#>  4 Azd.5991_10               25736                      1                      0
#>  5 Camptotheci…              25736                    829                    521
#>  6 Capivaserti…              25736                    126                     32
#>  7 Ceralaserti…              25736                    125                     63
#>  8 Chlorambuci…              25736                      0                      0
#>  9 Cytarabine_…              25736                    196                     38
#> 10 Erlotinib_h…              25736                    594                    322
#> # ℹ 17 more rows
#> # ℹ abbreviated names: ¹​Significantly_upregulated, ²​Significantly_downregulated
#> # ℹ 3 more variables: Total_significant <int>, padj_threshold <dbl>,
#> #   Log2FC_threshold <dbl>

Similar to single plate analysis, we can also use this heatmap shows shared differentially expressed genes with corresponding log2FC values.

plot_multi_de(combined, group_by = "combined_id", value = "log2FC", p_value_cutoff = 0.01, direction="up", n_genes = 5, control = "DMSO_0", by="fc")

=======
summarise_de(combined, lfc_threshold = 1, padj_threshold = 0.01, multi=TRUE)
#> # A tibble: 82 × 7
#>    combined_id  Total_genes_tested Significantly_upregu…¹ Significantly_downre…²
#>    <chr>                     <int>                  <int>                  <int>
#>  1 Abemaciclib…              25598                      2                      0
#>  2 Abemaciclib…              25598                     76                    190
#>  3 Abemaciclib…              25598                    125                     51
#>  4 Adavosertib…              25598                     15                      0
#>  5 Adavosertib…              25598                     51                      4
#>  6 Adavosertib…              25598                    136                     48
#>  7 Anastrozole…              25598                      0                      0
#>  8 Anastrozole…              25598                      0                      0
#>  9 Anastrozole…              25598                      0                      0
#> 10 Azd.5991_0.1              25598                      3                      2
#> # ℹ 72 more rows
#> # ℹ abbreviated names: ¹​Significantly_upregulated, ²​Significantly_downregulated
#> # ℹ 3 more variables: Total_significant <int>, padj_threshold <dbl>,
#> #   Log2FC_threshold <dbl>

Similar to single plate analysis, we can also use this heatmap shows shared differentially expressed genes with corresponding log2FC values.

plot_multi_de(combined, group_by = "combined_id", value = "log2FC", p_value_cutoff = 0.01, direction="up", n_genes = 5, control = "DMSO_0", by="fc")

>>>>>>> 2f6d7a6 (Fix vignette)